Queuing models

Distribution of request inter-arrival times

The problem: Let us assume that we want to model a system consisting of many users and a server, and we are interested in the response time of the server which is the time between the moment that a user sends a request to the server to the moment when the server has completed its answer (we ignore any communication delays).

Now, let us assume that the service time of the server is a constant (1 second), that is, it always takes exactly 1 second to provide an answer to a request. This means the distribution of the service time is a delta-function with a peak at t = 1 second.

If the users organize themselves in such a way that the next request will be sent to the server only after the last answer was received, then the response time will always be equal to the service time. However, if the users do not coordinate their requests, it may happen that a user sends a request before the last answer was produced. This means that the server is busy when the request arrives and the request has to wait (in some kind of input queue) before it will be processed. Therefore the response time, in this case, will be larger than the service time (service time plus waiting time). We see that the arrival pattern of the request has an impact on the average response time of the system.

Poisson inter-arrival pattern: It can be shown that, if there are very many users without any coordination among them, the inter-arrival time (that is, the time between the arrival of one request up to the arrival of the next request) has an exponential distribution of the form Pt = alpha * exp(- alpha * t) where alpha is the arrival rate. In fact, the average of this inter-arrival distribution is 1/alpha which is the average time between two consecutive requests. We see that in this case there is a good probability that the inter-arrival time is much smaller than the average. Therefore it would be interesting to determine what the average waiting time of requests in the server input queue would be. An answer is given by the queuing models.

Important conclusion: One cannot say what the response time of a server is without knowing the inter-arrival pattern of the requests. The performance guarantee of the server depends on the hypothesis about its environment (which determines the distribution of incoming requests).

Queuing models

In the simplest case, a queuing model represents a system where service requests arrive randomly (Poisson arrival process - with an exponential inter-arrival time distribution) and a shared resource is needed for performing the service; if the resource is occupied serving one request, subsequent requests must wait in a queue; the time the resource is required for serving a request (the service time) is supposed to be known (either fixed value or a given time distribution). Simple analytical formulas exist that may be used to calculate the average waiting time in the queue, the total time a service request remains in the system (this is the response time), the average length of the queue, etc. (for an overview, see "slides on Queuing Models").

For the case of a single server with a queue and with Poisson arrival and exponential service time (so-called M/M/1 queue), we have the following formulas:

The factor 1 / (1- ρ) is typical of most queuing formulas. It shows that the response time and queue length blow up when the service time approaches the inter-arrival time. If it reaches the inter-arrival time or becomes larger, the system is completely overloaded and the average queue lengths increases without bound.

Assumptions: The main assumptions for the formulas of the queuing models are the following (they not always satisfied - but the formulas often still provide good approximations): (a) independent arrival of individual requests (Poison arrival) - not satisfied when bursty arrivals (partly self-similar) are considered (see second part of "slides on Queuing Models"), or arrival at regular intervals; (b) distribution of service times may not be exponential - different other distributions can be considered - exact formulas exists for several of them. If the simplifying assumptions are not satisfied, one often uses simulations to study the performance of a given system (see comments below).

Example: Assume an inter-arrival time equal to 90% of the service time, and a constant service time. If the requests arrive in regular time intervals, each request will be processed before the next one arrives - there will be no queuing delay. If the requests arrive independently (according to a Poisson arrival process), the the above formula shows that the response time will be 10 times larger than the service time - a queuing delay 9 times larger than the service time.

Simulation programs: (a) one thread per active object - or (b) a program with future event list

Simulation models: When queuing models are not easily applicable (because the underlying assumptions are not met - or the system is more complicated), one may build simulation models. They can be quite detailed, depending on what aspects are important for the modeling. However, they may also require much CPU power for obtaining statistically valid results.

There are basically two approaches to writing a simulation program:

  1. (Intuitive) The program contains multiple threads (one for each active object being simulated) - each thread will execute a sleep operation whenever the simulated object performs some actions that take some time in real life (but do not need to be simulated in detail).
  2. (Professional) The simulation program is a sequential program (a single thread) using a queue of future events - the events are ordered according to the future time when they should be performed.

These two approaches use two very different programming structures. Here are some comments:


Copied from SEG-2106 course notes: February 22, 2013